Exploring efficient methods to detect document similarities using Locality-Sensitive Hashing (LSH).
This project focuses on employing Locality-Sensitive Hashing (LSH) to detect document similarities efficiently within a dataset of over 1,500 paragraphs. Through the use of shingling, minhashing, and the banding technique, the project uncovers hidden patterns and pairs of similar documents.
Click the link below to view the full implementation and analysis on GitHub:
View on GitHub